A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer

نویسندگان

  • Wensheng Zhang
  • Romdhane Rekaya
  • Keith Bertrand
چکیده

MOTIVATION An accurate diagnostic and prediction will not be achieved unless the disease subtype status for every training sample used in the supervised learning step is accurately known. Such an assumption requires the existence of a perfect tool for disease diagnostic and classification, which is seldom available in the majority of the cases. Thus, the supervised learning step has to be conducted with a statistical model that contemplates and handles potential mislabeling in the input data. RESULTS A procedure for handling potential mislabeling among training samples in the prediction of disease subtypes using gene expression data was proposed. A real data-based simulation study about the estrogen receptor status (ER+/ER-) of breast cancer patients was conducted. The results demonstrated that when 1-4 training samples (N = 30) were artificially mislabeled, the proposed method was able not only in correcting the ER status of mislabeled training samples but also more importantly in predicting the ER status of validation samples as well as using 'true' training data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Presence of Human Papillomavirus -16 and -18 Among Women with Breast Cancer in Isfahan Province

Background: Various studies proposed virus infection is to be a possible cause of human breast cancer. However, the data argue the association between virus and cancer are inconsistent. This study was conducted to detect whether HPV-DNA is present in tissue samples of breast cancer in Isfahan province. Materials and Methods: Paraffin embedded formalin fixed specimens were prepared from 40 brea...

متن کامل

The relationship between Human Papillomavirus and Epstein-Barr virus infections with breast cancer of Iranian patients

Background: Breast cancer is the malignancy in humans and other mammals. Several risk factors are involved in their appearance such as higher hormone levels and obesity. Identification of a mouse mammary tumor virus supports a viral etiology for breast tumors in animals. Viruses have been implicated in the development of various cancers, but viral induction for formation breast cancer is contro...

متن کامل

رابطه ویروس اپشتن بار با سرطان پستان

Background and purpose: Breast cancer is one of the most common malignancies in women and early diagnosis of this cancer is a key element for successful treatment. Breast cancer is a multistep disease in which a virus can play a role. Epstein-Barr Virus (EBV) is identified as an important factor in human cancer. This study investigated the relationship between EBV and breast cancer. Materials ...

متن کامل

Diagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data

Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...

متن کامل

نقش بیان ژن یوبیکوئیتین D (UBD) در پیش‌آگهی سرطان پستان

Background: Breast cancer is the most common non- skin cancer among women and it’s the second leading cause of cancer related death in women. Ubiquitin and ubiquitin like proteins are member of signal transduction pathways which have several cellular functions. It has shown that Ubiquitin like protein D (UBD) has accelerated the cancer progress. The aims of this study is evaluation of UBD gene ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 22 3  شماره 

صفحات  -

تاریخ انتشار 2006